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Abstract. Given a distribution p on persistence diagrams and observations 

t i d 
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^s^j Xi, ...X n ~ p we introduce an algorithm in this paper that estimates a Frechet 

, | mean from the set of diagrams Xi, ...X n . If the underlying measure p is a cambr- 

el nation of Dirac masses p = — YllLi then we prove the algorithm converges to 

a local minimum and a law of large numbers result for a Frechet mean computed 
by the algorithm given observations drawn iid from p. We illustrate the conver- 
f*"*) gence of an empirical mean computed by the algorithm to a population mean by 

simulations from Gaussian random fields. 

H 

CO 

^ 1. Introduction 

a 

There has been a recent effort in topological data analysis (TDA) to incorporate 
ideas from stochastic modeling. Much of this work involved the study of random 
abstract simplicial complexes generated from stochastic processes [22j EH [UJ EH 
£> [T51 [12] and non-asymptotic bounds on the convergence or consistency of topological 

summaries as the number of points increase |19[ [20] El HI [2] . The central idea in 
these papers has been to study statistical properties of topological summaries of 
£NJ point cloud data. 

In (16] it was shown that a commonly used topological summary, the persistence 
diagram [8], admits a well defined notion of probability distributions and notions 
^—i such as expectations, variances, percentiles and conditional probabilities. The key 

contribution of this paper is characterizing Frechet means and variances of finitely 
many persistence diagrams and providing an algorithm for estimating them. Exis- 
tence of these means and variances was previously shown. However, a procedure to 
compute means and variances was not provided. 

In this paper we state an algorithm which when given an observed set of persis- 
tence diagrams X\, ...,X n computes a new diagram which is a local minimum of the 
Frechet function of the empirical measure corresponding to the empirical distribu- 
tion p n := n~ l Y^i=x $Xi- In the case where the diagrams are sampled independently 
and identically from a probability measure that is a finite combination of Dirac 
masses we provide a (weak) law of large numbers for the local minima computed by 
the algorithm we propose. 
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2. Persistence diagrams and Alexandrov spaces with curvature 

bounded from below 

In this section we state properties of the space of persistence diagrams that we will 
use in the subsequent sections. We first define persistence diagrams and the I?- 
Wasserstein metric on the set of persistence diagrams. Note that this is not the 
same metric as was used in [TB]. We discuss the relation between the two metrics 
and why we work with the L 2 -Wasserstein metric later in this section. We then 
show that the space of persistence diagrams is a geodesic space and specifically an 
Alexandrov space with curvature bounded from below. We show that the Frechet 
function in this space is semiconcave which allows us to define supporting vectors 
which will serve as an analog of the gradient. The supporting vectors will be used in 
the algorithm developed in the following section to find local minima - the algorithm 
is a gradient descent based method. 

2.1. Persistent homology and persistence diagrams. Consider a topological 
space X and a bounded continuous function / : X — > R. For a threshold a we define 
sublevel sets X a = oo,a]. For a < b inclusions X a C X& induce homomor- 

phisms of the homology groups of sublevel sets: 

f;' fe :H,(X a )^H £ (X b ), 

for each dimension I. We assume the function / is tame which means that f^ _5,c is 
not an isomorphism for any 5 > at only a finite number of c's for all dimensions i 
and H^(X a ) is finitely generated for all a € R. We also assume that the homology 
groups are defined over field coefficients, e.g. Z2. 

By the tameness assumption the image F°~' 6 := Imf®~ S,b C H^(Xfc) is independent 
of 5 > if 5 is small enough. The quotient group 

B? = H,(X a )/F^' a 

is the cokernel of ,a and captures homology classes which did not exist in sublevel 
sets preceding X a . This group is called the £-th birth group at X a and we say that 
a homology class a £ H£(X a ) is born at X a if its projection onto B" is nontrivial. 

Consider the map 

gf : B" -> H,(X 6 )/F a - b 

and denote its kernel as D"' fe . The kernel captures homology classes that were born 
at X a but at X& are homologous to homology classes born before X a . We say that 
a homology class a S H£(X a ) that was born at X a dies entering X& if its projection 
onto D"' fc is but its projection to D"' fc_ is nontrivial for all sufficiently small 5 > 0. 
We also call b a degree-r death value of B" if rankD"' b — rankD"' fc <5 = r > for all 
sufficiently small 5 > 0. 

If a homology class a is born at X a and dies entering X& we set b(a) = a and 
d(a) = b and represent the births and deaths of ^-dimensional homology classes by 
a multiset of points in M 2 with the horizontal axis corresponding to the birth of a 
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class, the vertical axis corresponding to the death of a class, and the multiplicity of 
a point being the degree of the death value. The idea of a persistence diagram is to 
consider a basis of persistent homology classes {a} and to represent each persistent 
homology class a by a point (b(a),d(a)). 

The persistence of a is the difference pers(a) = d(a) — b(a). In the general setting 
we could have points with infinite persistence which corresponds to points of the 
form (— oo,y) or (x,oo). These points are infinitely far from all points on finite 
persistence and hence would have to be treated separately. The space of persistence 
diagrams would be forced to be disconnected with each component corresponding to 
the number of points at infinity. For the sake of clarity we will restrict ourselves to 
the case where all classes have finite persistence. This can be achieved by considering 
extended persistence but for simplicity we can simply kill everything by setting 
S f = 0ifb>su PxeX f(x). 

After establishing some notation we can define persistence diagrams and the distance 
between two diagrams. Let A = {(x, y) G M 2 | x = y} be the diagonal in M 2 . Let 
|| x — y|| be the usual Euclidean distance if x and y are off diagonal points. With a 
slight abuse of notation let ||x — A|| denote the perpendicular distance between x 
and the diagonal and ||A — A|| =0. 

Definition 2.1. A persistence diagram is a countable multiset of points in R 2 along 
with the infinitely many copies of the diagonal A = {(x, y) € R 2 | x = y}. We 
also require for the countably many points Xj G M 2 not lying on the diagonal that 

Ej \\ x i ~ A ll < °°- 

Each point p = (a, b) in a persistence diagram corresponds to some homology class 
a with b(a) = a and d(a) = b. As a slight abuse of notation we say that p is born 
at b(p) := b(a) and dies at d(p) := d(a). 

We denote the set of all persistence diagrams by T>. One metric on T> is the L 2 - 
Wasserstein metric 

(1) d L ,(X,Y) 2 = inf V||x-0(x)|| 2 

S:X— >Y — ' 

Here we consider all the possible bijections (ft between the off diagonal points and 
copies of the diagonal in X and the off diagonal points and copies of the diagonal 
in Y. Bijections always exist as any point can be paired to the diagonal. We will 
call a bijection optimal if it achieves this infimum. 

In much of the computational topology literature the following p-th. Wasserstein 
distance between two persistence diagrams, X and Y, is used 

d Wp (X,Y)= (inf^||x-<Kx)||£ 
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In [16] the above metric was used to define the following space of persistence dia- 
grams 

V p = {x | d Wp {x^) < oo}, 
with p > 1 and is the diagram with just the diagonal. It was shown in [16j[Thm 

6 and 10] that T> p is a complete separable metric space and probability measures 
on this space can be defined. Given a probability measure p on T> p the existence of 
a Frechet mean was proven under restrictions on the space of persistence diagrams 
Dp |16j [Thm 21 and Lemma 27]. The basic requirement is that p has a finite second 
moment and the support of p has compact support or is concentrated on a set with 
compact support. 

In this paper we focus on the L 2 -Wasserstein metric since it leads to a geodesic space 
with some known structure. Thus we consider the space of persistence diagrams 

V L 2 = {x | d L 2 (x, 0) < oo}. 

The results stated in the previous paragraph will also hold for T> L 2 with metric 
dip. , including existence of Frechet means. This follows from the fact that for any 
x,y G R 2 

(2) \\x - ylloo < \\x - y\\ 2 < V2\\x - y||oo, 

so d\v 2 (X, Y) < d L 2(X, Y) < \/2d\y 2 (X, Y). This inequality coupled with the results 
in [7j implies the following stability result for the L 2 Wasserstein distance. 

Theorem 2.2. LeiX be a triangulable, compact metric space such that dw k {Diag(h) ,0) fe < 
Cx for any tame Lipschitz function h : X — >■ M with Lipschitz constant 1, where 
diag(h) denotes the persistence diagram of h, k £ [1,2), and Cx is a constant de- 
pending only on the space X. Then for two tame Lipschitz functions f, g : X — > M 
we have 

k+2 

d L 2(Diag(f),Diag(g)) < 2^ 
where C = Cxm&x{Lip(f) k , Lip(g) k } . 

For ease of notation in the rest of the paper we denote djj2(X, Y) 2 as d(X, Y) 2 . 

Proposition 2.3. For any diagrams X, Y G V L 2 the infimum in ([!]) is always 
achieved. 

We prove this proposition in the appendix. 

We now show that the space of persistence diagrams with the above metric is a 
geodesic space. A rectifiable curve 7 : [0, 1] — > X is called a geodesic if it is locally 
minimizing and parametrized proportionally to the arc length. If 7 is also globally 
minimizing, then it is said to be minimal. T> L 2 is a geodesic space if every pair 
of points is connected by a minimal geodesic. Now consider diagrams X = {x} 
and Y = {y} and some optimal pairing <j) between the points in X and Y . Let 

7 : [0, 1] —> T> L 2 be the path from X to Y where j(t) is the diagram with points 
which have travelled in a straight line from the point (which can be a copy of the 



c\\f 



q\\ 2 - k 
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diagonal) x to the point (which can be a copy of the diagonal) for a distance of 
t\\x — 4>(x)\\. In other words, the diagram with points {(1 — t)x + t<f)(x) \ x £ A}£j] 7 
is a geodesic from X to Y. The proof of this is the observation that 4>t : X — > j(t) 
where 

(3) <t>?{x) = {l-t)x + t<l>{x) 

is optimal. 

2.2. Gradients and supporting vectors on T> L 2. We will propose a gradient 
descent based algorithm to compute Frechet means. To analyze and understand the 
algorithm we will need to understand the structure of V L 2 . We will show that V L 2 is 
an Alexandrov space with curvature bounded from below (see [5] for more informa- 
tion on these spaces). This result is not so surprising since there are known relations 
between L 2 -Wasserstein spaces and Alexandrov spaces with curvature bounded from 
below |21[|13j . The motivating idea behind these spaces was to generalize the results 
of Riemannian geometry to metric spaces without Riemannian structure. 

The property and behavior of Frechet means is closely related to the curvature of the 
space. For metric spaces with curvature bounded from above, called C^4T-spacesJ^] 
properties of Frechet means have been investigated and there exist algorithms to 



compute Frechet means [25]. is not a CvlT-space, see Proposition 2.4 T>ip. is 
however an Alexandrov space with curvature bounded from below. Less is known 
about properties of Frechet means in these spaces as well as algorithms to compute 
Frechet means. We use the structure of Alexandrov spaces with curvature bounded 
from below to compute estimates of Frechet means and provide some analysis of 
these estimates. Note that Frechet means are the same as barycenters which is 
what is referred to in much of the literature. 

We first confirm that T> L 2 is not a CMT-space. 
Proposition 2.4. T> L 2 is not in CAT{k) for any k > 0. 

Proof. If V L 2 e CAT(fc) then for all X,Y £ V L 2 with d(X,Y) 2 < 7r 2 /k there is 
a unique geodesic between them |3j [Proposition 2.11]. However, we can find X, Y 
arbitrarily close with two distinct geodesies. One example is taking X to be a 
diagram with two diagonally opposite corners of a square and Y a diagram with the 
other two corners. The horizontal and vertical paths are equally optimal and we 
may choose the square to be as small as we wish. □ 

The following inequality characterizes Alexandrov spaces with curvature bounded 
from below by zero |21j . Given a geodesic space X with metric d! for any geodesic 
7 : [0, 1] -»• X from X to Y and any Z G X 

(4) d'(Z, 7 (t)) 2 > td'(Z, Yf + (1 - t)d'{Z, X) 2 - t(l - t)d'(X, Yf. 

4f both x and <j>{x) are the diagonal then this is the diagonal. If exactly one of x or 4>{x) is the 
diagonal then we replace it in this sum by the closest point in the diagonal to 4>(x) or x respectively. 
2 Terminology given by Gromov [9] that stands for Cartan, Alexandrov, and Toponogov. 
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We now show that T> L 2 is a non-negatively curved Alexandrov space. 

Theorem 2.5. The space of persistence diagrams T> L 2 with metric d given in ([!]) is 
a non-negatively curved Alexandrov space. 

Proof. First observe that V L 2 is a geodesic space. Let 7 : [0, 1] — > V L 2 be a geodesic 
from X to Y and let Z £ D L 2 be any diagram. We want to show that the inequality 
@ holds. 

Let 4> be an optimal bijection between X and Y which induces the geodesic 7. That 
is 7(i) = {(1— t)x+t<p{x) I x G X} and defined (pt{x) = tx+(l — t)(f>(x) as done in ([3]). 
Let 4> z : Z — > 7(i) be optimal. Construct bijections (ft z : Z — > X and d^ z : Z Y 
by <p z = ((j)t)~ l 4> l z an d (fz = There is no reason to suppose that either 
bijections 4> z or (jx^ are optimal. Note that if <ft z (z) = A then 4>^{z) = A and 
<f%(z) = A. 

From the formula for the distance in T> L 2 we observe 

zez zez 
d(Z,Y) 2 <y\\z-<p Y z (z)\\ 2 , 



(5) 



d(x,y) 2 = £ ||^(z) - 0(0f (z))|| 2 = £ - ^(*)f- 

^ez zez 

Euclidean space has everywhere curvature zero so for each z in the diagram Z, and 
all t G [0, 1], we have 

l^-[(i-i)^f (^)+*0¥(^)]ll 2 = t|k-^(^)ii 2 +(i-i)ik-0f (^)n 2 -i(i-*)ll0f 

Combining these equalities with inequalities ([5]) gives us the desired result. □ 

2.3. Properties of the Frechet function. Given a probability distribution p on 
T> L 2 we can define the corresponding Frechet function to be 

F : V L 2 -> M, 7^ f y) 2 dp(X). 

JV L 2 

The Frechet mean set of p is the set of all the minimizers of the map F on D L 2 . If 
there is a unique minimizer then this is called the Frechet mean of p. The variance 
is then defined to be the infimum of the above functional. 

We will show that the Frechet function has the nice property of being semiconcave. 
For an Alexandrov space f2, a locally Lipschitz function / : Q — > M. is called X- concave 
if for any unit speed geodesic 7 in f2, the function 

/ o 7 (t) - \t 2 /2 
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is concave. A function / : — >• M is called semiconcave if for any point i£S! there 
is a neighborhood Q. x of x and A £ M such that the restriction f\o, x is A-concave. 

Proposition 2.6. If the support of p is bounded (as in has bounded diameter) then 
the corresponding Frechet function is semiconcave. 



Proof. We will first show that if the support of a probability distribution p is bounded 
then the corresponding Frechet function is Lipschitz on any set with bounded di- 
ameter. We then show that for any unit length geodesic 7 and any X G T> L 2 the 
function 

g x (s) := d( 7 (s),X) 2 -s 2 

is concave. We then complete the proof by showing the Frechet function F is 2- 
concave at every point (and hence F is semiconcave) by considering ^(7(5)) — s 2 as 
fg x (s)dp(X). 

Let U be a subset of with bounded diameter. This means that there is some 
K such that for any Y G U we have J d(X, Y)dp(X) < K. Here we are also using 
that the support of p is bounded. Let Y, Z G U. Then 



J d(X, Y) 2 — d(X, Z) 2 dp(X) 



\F(Y)-F(Z)\ 



(d(X, Y) - d(X, Z))(d(X, Z) + d(X, Y))dp{X) 

< J (d(Z, Y))(d(X, Z) + d(X, Y))dp(X). 
= 2Kd(Z,Y). 

Let 7 be a unit speed geodesic and X £ V L 2 . Consider the function 

g x (s) :=d( 7 (s),X) 



^ s 2 . 



We want to show that gx is concave which means that gx(tx + (1 — t)y) > tgx(x) + 
(1 — t)gx(y). Let 7(£) be the geodesic from 7(2:) to j(y) traveling along 7 so that 
7((1 - t)x + ty) = 7(f) for t G [0, 1] and 

tgx(x) + (1 - t)g x {y) = *d(7(0), X) 2 + (1 - t)d(7(l), X) 2 - tx 2 - (1 - t)y 2 



<d( 7 (t),X) 2 + t(l-t)d( 7 (0),7(l)) 2 
= d(j(t), X) 2 + t(l -t)(x- y) 2 - tx 2 
= dm),X) 2 -(tx + (l-t)y) 2 
= g x {tx + {l-t)y). 



tx A 



(1 " t)y 2 



(1 - t)y 2 



The inequality comes from the defining inequality Q that makes T> L 2 a non-negatively 
curved Alexandrov space. 
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By the construction of gx we can think of F{ r y{s)) — s 2 as f gx(s)dp(X). This means 
that we can write 

t[F(rf(x)) - x 2 } + (1 - t)[F(j(y)) -y 2 } = f tg x (x) + (1 - t)g x {y)dp(X). 

The concavity of gx ensures that tgx(x) + (1 — t)gx(y) < gx(tx + (1 — t)y) and 
hence 

t[F( 7 (x)) - x 2 } + (1 - t)[F( 7 (y)) -y 2 }< f g x {tx + (1 - t)y)dp(X) 

= F(tx + (l-t)y)-(tx+(l-t)y) 2 

□ 

We now define the additional structure on Alexandrov spaces with curvature bounded 
from below that we will need to define gradients and supporting vectors. This ex- 
position is a summary of the content in |21l I24j . 

Given a point Y in an Alexandrov space A with non-negative curvature we first 
define the tangent cone Ty. Let Xy be the set of all nontrivial unit-speed geodesies 
emanating from Y. For 7, rj G Sy the angle between them defined by 

/ p 2 _,_ +2 _ j(^,( a \ „(+\\2 s 

^y(7) v) := arccos 



, 2 + ^_, (7M ,, (t yx 

\s,w 2st J 



when the limit exists. We define the space of directions (Sy,Zy) at Y as the 
completion of Sy/ ~ with respect to Zy, where 7 ~ rj if Zy(7, ry) = 0. The tangent 
cone Ty is the Euclidean cone over Sy: 

Ty := Sy X [0,Oo)/Sy X {0} 

d TY (b,s), (rj,t)) 2 := s 2 + t 2 - 2st cos Zy (7, rj). 
The inner product of u = (7, s), v = (77, t) G Ty is defined as 

(u,v)y := St COS Zy (7, 7]) = - [s 2 + t 2 - (f Ty (u,v) 2 ] . 

A geometric description of the tangent cone Ty is as follows. Y G T> L 2 has countably 
many points {y{\ off the diagonal. A tangent vector is a set of vectors {vi G M 2 } one 
assigned to each yi along with countably many vectors at points along the diagonal 
pointing perpendicular to the diagonal such that the sum of the squares of the 
lengths of all these vectors is finite. Observe that there can exist tangent vectors 
such that the corresponding geodesic may not exist for any positive amount of time. 
The angle between two tangent vectors is effectively a weighted average of all the 
angles between the pairs of vectors. 

We now define differential structure as a limit of rescalings. For s > denote the 
space (^4, s ■ d) by sA and define the map i s : sA — > A. For an open set f2 C A and 
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any function / : — >• R the differential of / at a point p G f! is a map T p — > M is 
defined by 

d p f = lim s(foi s - f(p)), foi s :sA^R. 

For semiconcave functions the above differential is well defined and we can study 
gradients and supporting vectors. 

Definition 2.7 (Gradients and supporting vectors). Given an open set f2 C A and 
a function / : £1 — > M we denote by V p / the gradient of a function / at a point 
p G f2. V p / is the vector v £ T p such that 

(i) dpf(x) < (v, x) for all x &T p 

(ii) dp/(u) = 

For a semiconcave / the gradient exists and is unique (Theorem 1.7 in [H]). We say 
s G T p is a supporting vector of / at p if d p f(x) < —(s,x) for all x G T p . Note that 
— V p / is a supporting vector if it exists in the tangent cone at p. 

Lemma 2.8. (i) If s is a supporting vector then \\s\\ > ||V P /||. 

(ii) If p is local minimum of f and s is a supporting vector of f at p then s = 0. 

Proof, (i) First observe that from the definitions of V p / and supporting vectors we 
have 

(Vp/,V P /) = d p f(V p f) < -{a, V p /). 

We also know that 

< (Vp/ + s, Vp/ + s) = (V p /, Vp/) + 2(Vp/, s) + (s, s). 

These inequalities combined tell us that < — (Vp/, Vp/) + (s, s). 

(ii) If p is a local minimum of / then d p f(x) > for all x G T p . In particular 
dp(s) > 0. Since s is a supporting vector —(s,s) > d p f(s) > 0. This implies 
(s, s) = and hence s = 0. □ 

We care about gradients and supporting vectors because they can help us find local 
minima of the Frechet function. Indeed a necessary condition for F to have local 
minimum at Y is s = for any supporting vector s of F at Y. Since the tangent 
cone at Y is a convex subset of a Hilbert space we can take integrals over probability 
measures with values in Ty . This allows us to find a formula for a supporting vector 
of the Frechet function F. 

Proposition 2.9. Let Y G V L 2. For each X G V L 2 let F x : Z (->■ Z) 2 . 

(i) If j is a distance achieving geodesic from Y to X, then the tangent vector 
to 7 atY of length 2d(X, Y) is a supporting vector at Y for Fx- 

(ii) If sx is a supporting vector at Y for the function Fx for each X G supp(p) 
then s = f sxdp(X) is a supporting vector at Y of the Frechet function F 
corresponding to the distribution p. 
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Proof, (i) Let 7 be a unit speed geodesic from Y to X. Consider the tangent vector 
s x = (7, 2d(X, Y)). Let 7(t)« denote the point in 7(f) that is sent to Xi G X. Since 
7 is a distance achieving geodesic we know that 

, )^ v Eii^-^ x *)ii 2 = Eii ar <-^°)<ii 2 = i ^( y )- 



To show dyFxiv) < (s x ,v) for all u 6 Ty it is sufficient to consider vectors of the 
form (7, 1) where 7 is a unit speed geodesic starting at Y. Let j(t)i denote the point 
in j(t) which started at 7(0)^. This means that X{ >->■ is a bijection from X to 
7 (i) and 



d Y F x (v) 



d_ 
dt 



= lim 

t-to 

= lim 

t->-o 



Fxm)) 

t=0 

F x m))-F X {Y) 



t 



inf{X] Iki 



7(O) i || 2 |0:X^ 7 (t)} 



lim 



£ Il7(0)i - 7(*)i|| 2 - 2||7(0)i - 7(tU\\ Xi - 7 (0),|| cos^ 



where 0j is the angle between the paths s >-)■ 7(5), and t >->■ 7 (i)j in the plane. Now 

N " 7(0)i|| = h(d(X,Y)) t - 7 (0)i|| = 
for all s > and ||^y(0)i - 7(i)i|| 2 = t 2 \\j(0)i - 7(l)i|| 2 for all t. This implies that 
d Y F x ( V ) < -2d(X,Y) lim ^ll7(^-7(0)dlll7(^-7(0)d|co S ^^ 

t,s±0 St 

Recall from our construction of the tangent cone that 
(v,s x ) = 2d L2 (X,Y)cos(Z y ( 7 ,7)) 

= 2d { X,Y)(^ s2 + t2 - d ^^ 2 ) 
1 ' ' \s,uo 2st J 

- 2d(X Y) Aim £ I|7(S) - ~ 7(0)t| ' 2 + ~ ^ (0)l ' 2 ~ ll7(g)t ~ ^ l|2 

1 ' ; \ s ,uo 2st 

= 2d(X Y) Aim ^Il7(^-7(0).IHI7(^- 7(0)^11 cos^A 

' \t,siO St J ' 

By comparing these equations we get dyF x (v) < —(v, s x ) and thus we can conclude 
sx is a supporting vector. 
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(ii) Now let sx be any supporting vector of Fx- By its definition we know that 
dyFx{v) < —(sx,v) for all v G Ty and hence 



d Y F(v) = I d Y F x (v)dp(X) < I (-(s x ,v))dp(X) = -( / s x dp(X),v t 

□ 

In the following section we provide an algorithm that computes a local minimum 
of a Frechet function using a gradient descent procedure. The above results will 
be used since computing a supporting vector of Z \— > d(X, Z) 2 can be significantly 
easier and faster than computing a supporting vector of F itself 

3. Finding local minima of the Frechet function 

In this section we state an algorithm that computes a Frechet mean of a finite 
set of persistence diagrams with finitely many off diagonal points, and examine 
convergence properties of this algorithm. We will restrict our attention to diagrams 
with only finitely many off-diagonal points with multiplicity of the points allowed. 

Given a set of persistence diagrams {Xi}^ =1 a Frechet mean Y is a diagram that 
satisfies 



mm 



F m := [ d(X,Y) 2 d Pm (X) 

JV T o 



,-1 



with the empirical measure p m := m YnLi $x t - 

We employ a greedy search algorithm based on gradient descent to find a local 
minimum. A key component of this greedy algorithm (see Algorithm [TJ consists of 
a variant of the Kuhn-Munkres (Hungarian) algorithm |18j . 

The Hungarian algorithm finds the least cost assignment of tasks to people under 
the assumption that the number of tasks and people are the same. The input is the 
cost for each person to do each of the tasks. Suppose we have two diagrams X and 
Y each with only finitely many off diagonal points. Consider as many copies of the 
diagonal in X and Y to allow the option of matching every off diagonal point with 
the diagonal. We can think of the points and copies of the diagonal in X as the 
people and the points and copies of the diagonal in Y as tasks. The cost of x G X 
doing task y G Y is ||x — y|| 2 . The total cost of an assignment (or in other words 
bijection) <fi of tasks to people is Ylxex \\ x ~ <K X )II 2 - The Hungarian algorithm gives 
us a bijection <j> that minimizes this cost. This means it gives an optimal pairing 
between X and Y. 

We would like to use the arithmetic mean of points in the plane and some number 
of copies of the diagonal. If x\, . . . , x m are points in M 2 then there arithmetic mean 
w = - YLiLi x i 1S the choice of z that minimizes the sum Y^=i \\ z ~ x i\\ 2 - If #i = A 
for all i then the arithmetic mean is set to be A. The final case, without loss of 
generality, is when X\ j . . . 5 Xfc J3X6 all off diagonal points and Xk+i, ■ ■ ■ , x m are all the 
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diagonal. Let w be the normal arithmetic mean of x\, . . . ,Xk and let w& be the 
closest point on the diagonal to uu. We set 

, kw + (m — k)u>A 

w := 

m 

to be the arithmetic mean of x%, ... ,x m . This is the choice of z that minimizes 
Y^ILi \\ z ~ x i\\ 2 ■ We use an operation meanj = i j .. jm (x^) that computes the arithmetic 
mean for each pairing over the diagrams. 

Suppose Y is our current estimate for the Frechet mean. Using the Hungarian 
algorithm we compute optimal pairings between Y and each of the Xi. We denote 
these pairings as {(y* , xj)}j l =1 where Ji is the number of off diagonal in Xi and Y 

combined. For each yj / Awe then consider all the Xij. Let yi be the arithmetic 
mean of the x^. Whenever in our pairings xj)}j' =l we see a (A,xj) we think 
this as a different copy of the diagonal as in any pairing between Y and X^ with 
k 7^ i. We would be using the arithmetic mean of m — 1 copies of the diagonal and 
x\ . Let Y' be the diagram with points yi . We will show later that if Y = Y 1 then Y 
is a local minimum of the Frechet function. Otherwise we chose Y' to be our current 
estimate. 

The basic steps of Algorithm [T] is to: 

(a) randomly initialize the mean diagram. For example we can start at one of 
the m persistence diagrams or the midway point of two of the m diagrams; 

(b) use the Hungarian algorithm to compute optimal pairings between the esti- 
mate of the mean diagram and each of the persistence diagrams; 

(c) update each point in the mean diagram estimate with the arithmetic mean 
over all diagrams - each point in the mean diagram is paired with a point 
(possibly on the diagonal) in each diagram; 

(d) if the updated estimate locally minimizes F m then return the estimate oth- 
erwise return to step (b). 

An alternative to the above greedy approach would be a brute force search over 
point configurations to find a Frechet mean. One way to do this is to list all possible 
pairings between points in each pair of diagrams. Then compute the arithmetic 
mean for all such pairings. One of these means will be a Frechet mean. While this 
approach will find the complete mean set its combinatorial complexity is prohibitive. 

3.1. Convergence of the greedy algorithm. The remainder of this section pro- 
vides convergence properties for Algorithm [T] By convergence we mean that the 
algorithm will terminate at some point having found a local minimum. The reason 
for this is that at each iteration the cost function F m decreases, at each iteration the 
algorithm uses a new set of pairings, and there are only finitely many combinations 
of pairings between points in the diagrams. 

We first develop necessary and sufficient conditions for a diagram Y to be a local 
minimum of a set of persistence diagrams. We define Fi(Z) := d(Z, Xi) 2 , the Frechet 
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Algorithm 1: Algorithm for computing the Frechet mean Y from persistence 
diagrams Xi, . . . , X m . 



input : persistence diagrams {X\, . . . , X m } 
return: Frechet mean {Y} 

Draw i ~ Uniform(l, n); /* randomly draw a diagram */ 
Initialize Y <— Xf, /* initialize Y */ 

stop <— false ; 
repeat 

K = \Y\; /* the number of non-diagonal points in Y */ 
for i=l,. . . , m do 

(y J , X?) <— Hungarian(Y, Xi) ; /* compute optimal pairings between 
each Xi and Y using the Hungarian algorithm */ 

for j=l,. . . K do 

y 3 ^— meanj = i v . !m (a^) /* set each non-diagonal point in Y to 
the arithmetic mean of its pairings */ 

if Hungarian(Y, Xi) = (yj,x\) then stop <— true /* The points in the 
updated Y are optimal pairings w.r.t. each Xi */ 
until stop=true; 
return: Y 



function corresponding to <5av This allows us to define the Frechet function as 
F = m Ta=i F i corresponding to the the distribution ^ YlT=i ^ • 

The following lemma provides a necessary condition for a diagram to be a local 
minimum of F. This condition is the stopping criterion in Algorithm [TJ 

Lemma 3.1. If W = {wi} is a local minimum of the Frechet function F = 
— 5Zj=i Fj F then there is a unique optimal pairing from W to each of the Xj which 
we denote as <j)j and each Wi is the arithmetic mean of the points {4>j(wi)}j=i t 2...m- 
Furthermore if and Wi are off-diagonal points such that \\wk — wi\\ = then 
\\<t>j{wk) ~ 4>j{wi)\\ = for each j. 



Proof. Let 4>j be some optimal pairings (not yet assumed to be unique) between Y 
and Xj and let Sj be the corresponding vectors in the tangent cone at Y that are 
tangent to the geodesies induced by <j>j and are of length d(Xj,Y). The 2s j are 
supporting vectors for the functions Fj(Y) = d(Y,Xj) 2 by Proposition 
have ^ Y^JLi s j ^ s a supporting vector of F. 



2.9 



so we 
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From Lemma 2.8 we know that — Y^JLi s j = 0- Since at each W{ the Sj gives the 
vector from Wj to <f>j(wi), YlJLi s j = implies that Wi is the arithmetic mean of the 

points {4>j(Wi)}j=l,2...m- 

Now suppose that (ftk and 4>k are both optimal pairings. By the above reasoning 
we have ^(s k + YZjLij^k s j) = = ^X^=i s j and hence s k = s k . This implies 
that \\4>k{wi) — <Pk{wi)\\ = for all Wi £ W. In particular, for off-diagonal points w k 
and wi with \\w k — wi\\ =0 and (f> k an optimal pairing, we can consider the pairing 
4> k with w k and wi swapped. Since \\<p k (wi) — <Pk{wi)\\ = for all Wi G W we can 
conclude that \\(j)j(wk) — 4>j{wi)\\. 

□ 



We now prove that the above is also a sufficient condition for W to be a local 
minimum of F when F is the Frechet function for the measure ^ withe the 

diagrams Xi each with finitely many off-diagonal points. This requires a result about 
a local extension of optimal pairings. 

Proposition 3.2. Let X and Y be diagrams, each with only finitely many off diag- 
onal points, such that there is a unique optimal pairing cf^ between them and no off 
diagonal point in X matches the diagonal in Y . We further stipulate that if yk and 
yi are off-diagonal points with \\y k - yi\\ = then \\ - {yi)\\ = 0. 

There is some r > such that for every Z £ B(Y, r) there is a unique optimal 
pairing between X and Z and this optimal pairing is induced from the one from X 
to Y . By this we mean there is a unique optimal pairing (\>y from Y to Z and that 
the unique optimal pairing from X to Z is 4>y ° (fx- 

Furthermore, if ' X\, X2, ■ ■ ■ , X m and Y are diagrams with finitely many off-diagonal 
points such that there is a unique optimal pairing 4> x . between X{ and Y for each 
i with the same conditions as above, then there is some r > such that for every 
Z £ B(Y,r) there is a unique optimal pairing between each Xi and Z and this 
optimal pairing is induced by the one from Xi to Y. 



Proof. Since Y has only finitely many off-diagonal points there is some e > such 
that for every diagram Z with d(Y, Z) < e there is a unique geodesic from Y to Z. 

For each bijection <p of points in X to points in Y, define the function g^ between 
X and points in B(Y, e) by setting 

g<t,(X,Z) := \\x - <j) Y {4>{x))f + £ \\z - A|| 2 , 

xeX {z6Z:(^)-l(z)=A} 

where (j) Y is the optimal pairing that comes from the unique geodesic from Y to Z. 
First note that g^X, Z) < Y^x^x \\ x ~ ^y(<K x ))II 2 + d(Y, Z) 2 . Since there are only 
finitely many points in X and Y there is a bound M on \\x — <ft{x)\\ +e. M is a bound 
on || x - 0f {<t>{x))\\ for all x and all 4>. We also know ||0f (</>(») ~ ( / , ( 2; )ll < d(Y, Z) 
for all x G X. Let K be the number of off-diagonal points in diagrams X and Y 
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combined. 

94> (X, Z)<J2 ~ 4> Y {^i))\\ 2 + d(Y, Z) 2 , 

<J2(W X ~ *0»0II + IW*) " <Py(<P(x))W) 2 + d(Y,Zf, 

< ^(||x-0(x)|| 2 + ||0(x)-^(0(x))|| 2 

+ 2\\x - 4{x)\\\\<t>{x) - <t> Y {<t>{x))\\) + d(Y, Zf, 

< 94> (X, Y) + 2d(Y, Z) 2 + 2Md(Y, Z) K. 



Similarly 



g^X, Y) < 94) (X, Z) + 2d{Y, Zf + 2MKd(Z, Y) 



Let <p x be the optimal pairing from X to Y which is assumed to be unique in the 
statement of the proposition. Let <j> be another bijection of points in X to points in 
Y. Since there are only finitely many off-diagonal points in X and Y there are only 
finitely many possible <f). Set 

/3:= min {g $ (X,Y) - gMX,Y)\ = min (g^X, Y) — d(X, Y) 2 \ 

<t>¥=4>x ' <t>¥"t>x 

which must be positive as ^ s uniquely optimal by assumption. 

Choose r > such that 4r 2 + AMKr < j3. Now suppose that g<j>(Z, X) < g^r (Z, X) 
for some Z 6 B(Y,r). This will imply that 

g^X, Y) < 9<j> (X, Z) + 2d(Y, Z) 2 + 2MK d{Z, Y), 
< 9(j) y (X, Y) + M(Y, Zf + AMK d(Y, Z), 
< g<j> Y(X,Z)+P, 
which contradicts our choice of /3. 

Now suppose X\ , X<i , . . . , X m and Y are diagrams with finitely many off diagonal 
points such that there is a unique optimal pairing (fi^. between X{ and Y for each 
i. By the above argument there are some r\, r^, ■ ■ ■ r m > such that for each i and 
for every Z G B(Y, ri) there is a unique optimal pairing between each Xj and Z and 
this optimal pairing is induced by the one from Xj to Y. Take r = minjrj} which 
is positive. □ 

The following theorem states that Algorithm [I] will find a local minimum on termi- 
nation. 

Theorem 3.3. Given diagrams {X\, ...X m } and the corresponding Frechet function 
F, then W = {w{\ is a local minimum of F if and only if there is a unique optimal 
pairing from W to each of the Xj denoted as and each Wi is the arithmetic mean 
of the points {(f>j(wi)} j= i^...m- 
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Proof. In Lemma |3.1| we showed that it it is a necessary condition. 

Given m points in the plane or copies of the diagonal, {x\, X2, ■ ■ ■ , x m }, the choice 
of y which minimizes \\ x i ~ v\\ 2 ^ s the arithmetic mean of {a?i, . . . , x m }. As a 

result we know that F{Z) > F(W) for all Z with the same optimal pairings as W to 
X\, X2, . . . , X m . Since there is some ball B(W, r) such that every Z £ B(W, r) has 
the same optimal pairings as W, by proposition |3.2[ we know that F(Z) > F(W) 
for all Z in B(W, r). Thus we can conclude that W is a local minimum. □ 

4. Law of large numbers for the empirical Frechet mean 



In this section we study the convergence of Frechet means computed from sampling 
sets to the set of means of a measure. Consider a measure p on the space of persis- 
tence diagrams T> L 2. Given a set of persistence diagrams {Xi}f =1 *~ p one can define 
an empirical measure p n = - X^fc=i $x k - We will examine the relation between the 



two sets 



mm 

zev L2 



mm 

zev r2 



Jv 



d(X, Z) 2 dp(X) 



d(X, Z) 2 dp n (X) 



where Y and Y n are the Frechet mean sets of the measures p and p n respectively. 
We would like prove convergence of Y n to Y asymptotically with n - a law of large 
numbers result. 

There exist weak and strong laws of large numbers for general metric spaces (for 
example see |17j [Theorem 3.4]). These results hold for global minima of the Frechet 
and empirical Frechet functions F and F n , respectively. It is not clear to us how to 
adapt these results to the case of Algorithm [T] where we can only ensure convergence 
to a local minimum. It is also not clear how we can adapt these theorems to get 
rates of convergence of the sample Frechet mean set to the population quantity. 

In this section we provide a law of large number result for the restricted case where 
p is a combination of Dirac masses 



. m 



where Z{ are diagrams with only finitely many off diagonal points and we allow 
for multiplicity in these points. The proof is constructive and we provide rates of 
convergence. 



The main results of this section, Theorem 4.1 and Lemma [4.2[ provide a probabilistic 
justification for Algorithm [l| Theorem 4.1 states that with high probability local 
minima of the empirical Frechet function F n will be close to local minima of the 
Frechet function F. Ideally we would like the above convergence to hold for global 
minima, the Frechet mean set. The condition of Lemma 4.2 states that the number 
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of local minima of F n is finite and not a function of n. This suggests that applying 
Algorithm [I] to a random set of start conditions can be used to explore the finite set 
of local minima. 

Theorem 4.1. Set p = where Z{ are diagrams with finitely many off di- 

agonal points with multiplicity allowed. Let F be the Frechet function corresponding 

to p andY be a local minimum of F. Set {Xi}f =l *~ p, and denote the corresponding 
empirical measure p n = \ J2k=i an d Frechet mean function F n . There exists a 
local minimum Y n of F n such that with probability greater than 1 — 5 

for n > 8mhiy and m F ^ Y ^ In < r 2 where r characterizes the separation be- 
tween the local minima of F. 



Proof. The empirical distribution is 

Y n ^ m 

Pn = - y2 $x k = — Ci^Zi 



k=l 



i=l 



where £j is the random variable that states the multiplicity of each Zi appearing in 
the empirical measure, \{k : = Zi}\. Observe that £l>£2> • • • > Cm can be stated as 
a multinomial distribution with parameters n and p = (— > ~j ■ • • > ijj)' 

We will bound the probability that |£j — ^| > for any i = 1,2, . . . m. We then 
will show that under the assumption that |£j — < for all i = 1, 2, ... m for 

sufficiently small e > there is a local minimal Y n with d(Y, Y n ) 2 < e Tjz^pp ■ 

For each i, £j ~ Bin(n, — ) and n — £j ~ Bin(n, 1 — -). Using Hoeffding's inequality 



we obtain Pr [& - £ < -e£] < \ exp(-2 
n 



and 



Pr 



6 



n 

> e— 
m m 



Pr 



(n-fc)-(n--) < 
m 



Together they show that Pr 
Pr 



n 
m 

^1 > e^l < exp(-2^ 

m i — mJ — r \ m - 



< -exp 



e 2 n 



??? 



implying the bound 



\£i — — I < e — for all? = 1, 2, . . . , m 
m m 



> 1 — m exp 



e 2 n 



From now on we will assume that If; — — I < e— for all i = 1,2, ... ,m. Let us 
consider our algorithm for finding a local minimal of F n starting at the point Y. We 
first define some notation. We denote the points in Y by {yj}j = i- We denote by 

z\ := (fiyiVj) t ne point in Z{ that yj is paired to in the (unique) optimal bijection 
between Y and Z%. Recall that the z\ could be the diagonal but from our assumption 
that Y is a local minimum no off diagonal point in any Z\ is paired with the diagonal 
in Y. 
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Let (o] , 6j ) be the coefficients of the vector from yj to z\ in the basis of M 2 given by 
^) and (—^, ~^)- This basis has the advantage that when z\ is the diagonal 
then al = and = d(yj, A). From our assumption that Y is a local minimum we 
know that Yli=i a l = ® ano - ^2i=i &i = for all j and 

j=i i=i 

For the moment fix j. Without loss of generality reorder the Z; L so that the first k 
(with 1 < k < m) of the z\ are off the diagonal and the remained are copies of the 
diagonal. Let be the point in R 2 given by 



Vj + 





By construction this y 1 - is the weighted arithmetic mean of the z\ where we have 
weighted by the & taking into account that when i > k then z\ is the diagonal. 

Under our assumption that |£j — ^ | < for all z = 1, 2, . . . , m and using Yli=i a i = 
= YT=\ b \ we know that 

i / fc \ 2 i/ m \ 2 

\\v 3 -v]\\ 2 



< 



1 


(6 + 6 + - 




l 




(6 + 6 + - 


• •&) 2 


l 


2 2 

en 



~ 4n 2 (l-e) 2 ™ 2 

1 m 



2 




me 

< 



(1 — e) 2 1 m 



i=l 



Set Y n to be the diagram with off-diagonal points {y"}J =1 . Using the pairing between 



y and Y n where we pair yj with y™ we conclude that 

J 

d(Y,Y n )<J2\\yj- 

/ 1 m 

™EK) 2 + (^ 




m 
i=i 
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Set 5 = mexp (—2^^ and solve for e. This provides the bound that with proba- 
bility greater than 1 — 5 

,,^^n9 m 2 F(Y) , /m\ 1 
d(Y,Y n ) 2 < o V ' in' ' 



2n ~ V «5 / (1 - e) 2 ' 
For e G [0, .25] it holds that (1 - e)~ 2 < 2 and n > 8m In ^ implies e < .25. 

We want to show that Y n is a local minimum for sufficiently small e. Indeed it 
will be the output of Algorith m [T] given the initializing diagram of Y . Since Y is 
a local minimum, Proposition |3.2| implies that there is a ball around Y, B(Y,r), 
such that for every diagram in B(Y,r) there is a unique optimal pairing with each 
Zi which corresponds to the unique optimal pairing between Y and Zi. That is 
= &X 0y f° r an X £ B(Y,r). For e > such that t ™^^\p < r 2 we have 
Y n G S(Y,r). Plugging in for e results in m F ^ Y ^ In < r 2 . 

This implies that 0y = 0y-^ o 0y* is the unique optimal pairing between Y n and Zi 

for all i and hence </y k = (fry ° <t>y k f° r eac h of the sample diagrams . If = Z{ 
then 

By construction y^ 1 is the weighted arithmetic mean of the z\ (weighted by the 



and hence y" is the arithmetic mean of the x 3 k . By Theorem 3.3 Y n is local 



minimum. □ 

The above theorem provides a (weak) law of large numbers results for the local 
minima computed from n persistence diagrams but it does not ensure that the 
number of local minima is bounded as n goes to infinity. The utility of such a 
convergence result would be limited if the number of local minima could not be 
bounded. The following lemma states that the number of local minima is bounded. 

Lemma 4.2. Let p = i YaLi as before. Let p n = \ Z)fe=i $x k be the empirical 
measure of n points drawn iid from p and F n is the corresponding Frechet function. 
The number of local minima of F n is bounded by YULii^i + l)^ kl+k2+ --- km h Here ki 
is the number of off-diagonal points in the i-th diagram. This bound is independent 
ofn. 

Proof. Set Yj, as a local minimum of F n . This implies there are unique optimal 
pairings <pi between Y n and Xi for each i and that any point y in Y n is the arithmetic 
mean of {<fii(y)}- Since the optimal pairing is unique, if Xi = Xj then fa = cf>j. 
This in turn means that the fa are determined by which of Zi are in the set Xj 
(with multiplicity). This implies that the number of local minima is bounded by 
the number of different partitions into subsets of the points in the UXj so that each 
subset has exactly one point from each of the Xj . The number of subsets is bounded 
by k\ + hi + . . . + k m and for each subset there is a bound of i^i + 1) on t ne 
choices of which element to take from each of the Xi. Thus the number of different 
partitions is bounded by UiLiih + l)( fc i+ fc 2+-M. □ 
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We would like to discuss not only the convergence of local minima but also the 
convergence of the Frechet means. We can do this in the case when there is a 
unique Frechet mean. 

Lemma 4.3. Let p = ~ ^2iLi Sz i as before. Suppose further that the corresponding 
Frechet function F has a unique minimum. Let Pn = \ Ylk=i $x k be the empirical 
measure of n points drawn iid from p and F n is the corresponding Frechet function. 
Let Y be the Frechet mean of F and Y n the set of Frechet means of F n . With 
probability 1 the Hausdorff distance between Y n and Y goes to zero as n goes to 
infinity. 

Proof. It is sufficient for us to show for each r > that with probability 1 there is 
some N r such that Y n C B(Y, r) for all n > N r . 

Fix r > 0. Suppose there does not exist some N r such that Y n C B(Y,r) for all 
n > N r . Then there is some sequence of W nk £ Y nk such that d(W nk , Y) > r. The 
set {WnJ is clearly bounded, off-diagonally birth-death bounded and uniform and 
hence precompact. This implies that (W nk ) has a convergent subsequence (W nk .). 
Let W denote the limit of this sequence. Since d(W nk .,Y) > r for all j we have 
d(W,Y)>r. 



By the arguments in Proposition 2.6 there is some K independent of n such that 
F n is if-Lipschitz in B(W,1) and hence \F nk .{W nk .) - F nkj (W)\ < Kd(W nk .,W) 
for large j. Hence, for all e > we can say that F nk .(W) < F nk ^ .(W nk ) + e for 
sufficiently large j. 

The law of large numbers tells us that F n (W) -)• F(W) and F n (Y) -> F(Y) as 
n — > oo with probability 1. Hence for all e > we know that with probability 1 
both F(W) < F n (W) + e and F n (Y) < F(Y) + e for sufficiently large n. 

From our assumption that W nk is a Frechet mean of F nk . we know that F Uk . (W Uk . ) < 
F nk .(Y) for alii. 

Let e > 0. Combining the inequalities above we conclude that with probability 1 

F(W) < F nk .(W) + 6 < F rikj {W nk3 ) + 2e < F„ fc .(Y) + 2e < F(Y) + 36, 

for j sufficiently large. Since e > was arbitrary we obtain F(W) < ^(Y) which 
contradicts the uniqueness assumption about the Frechet mean. □ 



5. Persistence diagrams of random Gaussian fields 



We illustrate the utility of our algorithm in computing means and variances of per- 
sistence diagrams in this section via simulation. The idea will be to show that per- 
sistence diagrams generated from a random Gaussian field will concentrate around 
the diagonal with the mean diagram moving closer to the diagonal as the number 
of diagrams averaged increases. 

The persistence diagrams were computed from random Gaussian field over the unit 
square using the procedure outlined in Section 3 in [I]. The field generated is a 
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stationary, isotropic, and infinitely differentiable random field. The Gaussian was 
set to be mean zero and the covariance function was R(p) = exp(— a||p|| 2 ) where 
a = 100. A few hundred levels in the range of the realization of the field were 
taken for each level a simplicial complex was constructed. This was done by taking 
a fine grid on the unit square and including any vertex, edge or square in the 
complex if and only if the values of the field at the vertex or set of vertices (for 
the edge and square cases) were higher than the level. The complex increases as 
the level decreases which provides the filtering and from which birth-death values 
of the diagram were computed. We obtained from E. Subag 10, 000 such random 
persistence diagrams generated as described above. These diagrams contain points 
with infinite persistence, we ignore these points. Using extended persistence in 
computing the diagrams would address this issue. 
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Figure 1. The top two rows plot the mean persistence diagram 
for dimension zero. Each figure contains four means computed from 
the number of diagrams specified in the figure title. Each mean is 
computed from a different random sample of diagrams and is plotted 
in a different color. The bottom two rows are the sample plots for 
dimension one. 
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In Figure [T] we display the mean diagram of sets of 2, 4, 8, 16, 32, 64, 128 diagrams 
randomly drawn from the 10, 000 diagrams. This is done for both dimensions zero 
and one. We wanted to see that as the number of diagrams being averaged increases 
the Frechet means converged. To quantify this concentration we took ten draws of 
2,4,8,16,32,64,128 diagrams from the 10,000 diagrams and considered the distri- 
bution jq Yll=i $Xi where Xi where the Frechet means of each of the sets of samples. 
We then computed the variance of these distributions as documented in Table [TJ 

Table 1. Variance of the sample Frechet Means 



Number of samples 


H 


#i 


2 


0.8353 


0.9058 


4 


0.6295 


0.6741 


8 


0.4429 


0.5608 


16 


0.4356 


0.4618 


32 


0.3165 


0.3742 


64 


0.3362 


0.2965 


128 


0.3127 


0.2233 



6. Discussion 

In this paper we introduce an algorithm for computing estimates of Frechet means 
of a set of persistence diagrams. We demonstrate local convergence of this algorithm 
and provide a law of large numbers for the Frechet mean computed on this set when 
the underlying measure has the form p = m _1 > where Xi are persistence 

diagrams. We believe that generically there is a unique global minimum to the 
Frechet function and hence a unique Frechet mean but this needs to be shown. 

The work in this paper is a first step and several obvious extensions are needed. 
A law of large numbers result when the underlying measure is not restricted to a 
combination of Dirac functions is obviously important. The results in our paper 
are strongly dependent on the L 2 -Wasserstein metric; generalizing these results to 
the Wasserstein metrics used in computational topology is of central interest. The 
proofs and problem formulation in this paper are very constructive - the proofs and 
algorithms are developed for the specific examples and constructions we propose and 
are not meant to generalize to other metrics or variants on the algorithm. It would 
be of great interest to provide a presentation of the core ideas in the algorithm and 
theory we developed in a more general framework using properties of abstract metric 
spaces and probability theory on these spaces. 
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Appendix A. 



In order to prove Proposition 2.3 we need to give some conditions for a subset of 
T> L 2 to be relatively compact. We will use Theorem 21 in [16] which requires a few 
definitions. 

Definition A.l (Birth-death bounded). A set S C T> L 2 is called birth-death bounded, 
if there is a constant C > such that for all Z E S and for all A ^ x E Z 
max{|b(x)|, |d(x)|} < C, where b(x) and d(x) are the births and deaths respec- 
tively. 

For q > and diagram Z E T> L 2 we define the maps 

u a : V L 2 — > V L 2 such that A ^ x E u a (Z) <^=^> x E Z&pers(x) > a 

l a : V L 2 — > V L 2 such that A/i6 l a (Z) <^=^ x E Z&pers(x) < a, 

where u a (Z) is the ce-upper part of Z (the points in Z with persistence at least a) 
and l a (Z) is the a- lower part of Z (the points in Z with persistence less than a). 

Definition A. 2 (Off-diagonally birth-death bounded). A set S C T>i2 is called 
off- diagonally birth-death bounded if for all e > 0, u e {S) is birth-death bounded. 

Definition A. 3 (Uniform). A set S C T> L 2 is called uniform if for all e > there 
exists q > such that d(l a (Z), A) < e for all Z € S. 

Theorem 21 in [16] states that a subset of T>w v is relatively compact if and only if 
it is bounded, off-diagonally birth-death bounded and uniform. This also holds for 
T>ij2 due to the equivalence in norms stated in ([2]). We finally are ready to prove 
Proposition |2.3| 



Proof of Proposition \2.3\ Fix two diagrams X and Y. Let $ be the set of bijections 
(f) between points in X and points in Y with the further condition that 

\\x - <p{x)f < \\x - A|| 2 + \\(j)(x) - A|| 2 

for all x E X. Recall that by \\x — A|| we mean the perpendicular distance from x 
to the diagonal which can thought of as pairing x with the closest point to x on the 
diagonal. By the above condition we are requiring that we never pair an off diagonal 
point x E X with an off diagonal point in Y when pairing both with the diagonal 
would be more efficient. 

By considering only the bijections in <3? we are only removing bijections <j) for which 
there exists some 4> E $ such that YlxeX \\ x ~ ^( x )ll 2 < Y^xex \\ x ~ <K X )I| 2 - This 
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means that ([l]) is equal to ini{^2 xe x \\ x ~ <K X )I| 2 : G $}. We will show this 
infimum is a minimum. 

For each bijection <ft G f> we can construct a path 7^ : [0, 1] — > V L 2 by setting 74, (t) 
to be the diagram with points {(1 — t)x{ + t<^(xj)|xj G X}. Let S = {^^(t) : t G 
[0, 1], (ft G <£} which contains all the images of the paths 7^. We want to show that S 
is relatively compact. To do this we will show that S is bounded, off-diagonally birth- 
death bounded and uniform which are sufficient conditions for relative compactness 
by Theorem 21 in |16j . 

Firstly observe that for any bijection (ft and any t G [0, 1] we know 

d( l4> (t),A) 2 <d(X,A) 2 + d(Y,A) 2 

which is finite and independent of <ft and t. This implies that the set S is bounded. 

We now wish to show that S is off-diagonally bounded. For each e > there can 
only be finitely many points in X and Y whose distance from the diagonal is at least 
e. This implies that there is some C t such that all x G u t (X) and x G u e (Y) satisfy 
max{|b(x)|, |d(x)|} < C e . Let M := max{d(x, A) : x G X or x G Y}. We will show 
that if p G u e (Z) for some Z G S then max{|b(p)|, \d(p)\} < C e + y/2M. 

Consider p G Z for some Z £ S. This means p G 70(i) with (/> G and t G [0, 1] and 
hence p = (1 — t)x + t(j)(x) for some x £ X. We have 

b(p) G [min{b(x), b((/>(x))}, max{b(x), b(0(x))}] 

d(p) G [min{d(3;), d(^>(x))}, max{d(x), d(0(x))}] 

d(p, A) G [min{(i(x, A), d{(j>{x), A)}, max{c?(x, A),d(<p(x), A)}] 



In order for d(p, A) > e either d(x, A) > e or d(4>(x), A) > e and hence min{|b(x)|, |b(0(x)|} < 
t% and min{|d(x)|, |d(0(x)|} < (%• 

The condition for (ft to be in $ is that ||x — A|| 2 + ||0(x) — A|| 2 > ||x — </>(x)|| 2 and 
hence \\x — 4>(x)\\ < y2M. Since \b(x) — b((ft(x))\ < \\x — (ft(x)\\ we can conclude that 

max{|b(x)|, \h((ft{x)\} < min{|b(x)|, |b(0(x)|} + V2M < C e + V2M. 

Similarly we get max{|d(x)|, |d(</>(x)|} < C e + y/2M. 

We now will show that S is uniform. Recall that S is uniform if for all e > there 
exists an a > such that d(l a (Z), A) < e for all Z G S. For any diagram Z G V L 2 
denote Mk(Z) as the number of points in Z whose distance to the diagonal is in 
[2~ k , 2~ k+1 ) for k > 1 and let Mq(Z) be the number points with distance in [1, 00). 
Let Nk{Z) denote the number of points in Z whose distance from the diagonal is at 
least 2~ k (in other words the number of off diagonal points in u 2 -k(Z)). 
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Let X U Y be the diagram whose off diagonal points are the union of the off diagonal 
points in X and Y. Consider the following sum 



CO / J 



Nj(X U Y)2^ = E M ^ X U y ) 2 " 2 '' 

3=0 j=0 \k=0 J 



oo 

2k 



3=0 \k=j 

A 00 

= -^M 3 (xuy)2- 2 ^ 

6 3=0 

< \d{X U y, A) 2 < oo. 
Let e > 0. Since YlJLo ^j(X U Y)2~ 2 i converges there is some L such that 

oo 

^iV i (Xuy)2- 2 - ? ' < e/4. 

Let (/> £ $ be a bijection between X and y. Consider the path 7 : [0, 1] — > V L 2 
where 7^(t) is the diagram with points {(1 — i)x + t(j)(x) : x £ X}. For the point 
(1— t)x+t(f)(x) to lie a distance at least 2~ k from the diagonal at least one of x or (j)(x) 
must lie at least 2~ k from the diagonal. This implies that N^j^t)) < Nk(X U y) 
for all bijections <p and i € [0, 1]. In other words iVfe(Z) < iV fc (X U Y) for all Z E S. 

Now for any Ze Swe have 

00 00 00 

d{l 2 - L (Z),A) 2 < Y J M j (Z)2- 2 ^ +2 <4Y^Nj(Z)2-^ <4^2x j (XL)Y)2-^ <e. 

j=L j=L j=L 

Since the choice of a = 2~ L was made independently of Z G S we conclude that 5 
is uniform. 

We now know that S (the closure of S) is compact. Every path t H> 7^(i) is a 
i^-Lipschitz map from [0, 1] into 5 with if 2 = Ylxex \\ x ~ ( A( a; )ll 2 - 

Set K = <i(X, y) + 1 and let A be the set of K-Lipschitz maps from [0, 1] into S. 
Since S is compact, we know by the Arzela-Ascoli theorem that A is compact. By 
the definition of the infimum, there exists a sequence of bijections {4>j} such that 
Kfa < K for all j and K ( j >j is a sequence converging to K. The corresponding 
sequence of paths {-jj := 7^ .} is a sequence of K-Lipschitz maps from [0, 1] to S and 
hence lie in the compact set A. This means there must be a convergent subsequence 
of paths {"fnj} with some limit 7 which exists and lies in A as A is compact. 

Since 7^ (0) = X and 7^(1) = Y for all j (as they are all paths from X to Y) we 
know that 7(0) = X and 7(1) = Y. From d(j n At),^ n As)) < K^, n .\s — t\ for all 
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s, t € [0,1] and all j and the limit K$ n , — > K as j — > oo we can infer 

d(j(t),j(s)) < K\s-t\ 

for all s,t £ [0, 1]. If we follow along the path 7 where each point x £ X goes to in 
Y we can construct a bijection (/> from points in X to points in Y . This bijection 
achieves the infimum in (fll). □ 
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